1,889 research outputs found
About Metrics for Clone Detection
Clone detectors rely on the concept of similarity and dissimilarity measures to identify cloned fragments. The choice of specific distance function in a clone detector is arbitrary up to some extent. However, with a deeper knowledge of similarity measures, we can condition this choice to have some properties that can help improve scalability and quality of tools. This paper presents some interesting results, insights and questions about similarity and dissimilarity measures, including a somehow counter-intuitive result on the cosine distance
06301 Abstracts Collection -- Duplication, Redundancy, and Similarity in Software
From 23.07.06 to 26.07.06, the Dagstuhl Seminar 06301 ``Duplication, Redundancy, and Similarity in Software\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
Comparison and Evaluation of Clone Detection Tools
Many techniques for detecting duplicated source code (software clones) have been proposed in the past. However, it is not yet clear how these techniques compare in terms of recall and precision as well as space and time requirements. This paper presents an experiment that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC). Their clone candidates were evaluated by one of the authors as an independent third party. The selected techniques cover the whole spectrum of the state-of-the-art in clone detection. The techniques work on text, lexical and syntactic information, software metrics, and program dependency graphs
Levenshtein edit distance-based type III clone detection using metric trees
This paper presents an original technique for clone detection
with metric trees using Levenshtein distance as the
metric defined between two code fragments. This approach
achieves a faster empirical performance. The resulting
clones may be found with varying thresholds allowing type
3 clone detection. Experimental results of metric trees performance
as well as clone detection statistics on an open
source system are presented and give promising perspectives
Detection of redundant clone relations based on clone subsumption
Clone detection has been presented in the literature at
different levels of fragment granularity from functions, to
syntactic blocks, to variable length strings of source code
or tokens. String matching approaches, prefix and suffix
trees, metrics, syntactic approaches and others can be used
to compare fragments for similarity.
Inclusion relations between source code lines may cause
some clone relations to be redundant, when clones code
fragments subsume each other. This may occur between
nested blocks of source code, for example.
An original method to analyze this kind of redundancy in
clone relations is presented. The proposed method is based
on efficiently combining clone subsumption information together
with clone similarity relations on code fragments.
The amount of redundancy in clone relations has been
evaluated on two open source Java systems, Tomcat and
Eclipse. Experimental results are presented. Execution
time performance of redundancy analysis is measured and
reported. Results are discussed together with further proposed
research
Insider threat resistant SQL-injection prevention in PHP
Web sites are either static sites, programs, or
databases. Very often they are a mixture of these three
aspects integrating relational databases as a back-end.
Web sites require configuration and programming attention
to assure security, confidentiality, and trustiness
of the published information.
SQL-injection attacks rely on some weak validation
of textual input used to build database queries. Maliciously
crafted input may threaten the confidentiality
and the security policies of Web sites relying on
a database to store and retrieve information.
Furthermore, insiders may introduce malicious code
in a Web application, code that, when triggered by some
specific input, for example, would violate security policies.
This paper presents an original approach that combines
static analysis, dynamic analysis, and code reengineering
to automatically protect applications written
in PHP from both malicious input (outsider threats)
and malicious code (insider threats) that carry SQLinjection
attacks.
The paper also reports preliminary results about experiments
performed on an old SQL-injection prone
version of phpBB (version 2.0.0, 37193 LOC of PHP
version 4.2.2 code). Results show that our approach
successfully improved phpBB-2.0.0 resistance to SQLinjection
attacks
A feedback based quality assessment to support open source software evolution: the GRASS case study
Abstrac
Mapping features to source code in dynamically configured avionics software
Mapping software features to the code that implements them is an important activity for program
comprehension and software reengineering. In this paper, we present a novel automated approach to locate
features in source code based on static analysis and model checking. This approach focuses on dynamically
configured software in which the activation of specific features is controlled by configuration variables.
The main advantages of a static approach to feature location are its affordability and applicability to large
systems containing hundreds of features. Our methodology is applied to an industrial Flight Management
System from the avionics industry. Results show that a static approach to feature mapping is feasible and
can locate complex features whose implementation is spread across multiple files and functions
How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review
Context: Machine Learning (ML) has been at the heart of many innovations over
the past years. However, including it in so-called 'safety-critical' systems
such as automotive or aeronautic has proven to be very challenging, since the
shift in paradigm that ML brings completely changes traditional certification
approaches.
Objective: This paper aims to elucidate challenges related to the
certification of ML-based safety-critical systems, as well as the solutions
that are proposed in the literature to tackle them, answering the question 'How
to Certify Machine Learning Based Safety-critical Systems?'.
Method: We conduct a Systematic Literature Review (SLR) of research papers
published between 2015 to 2020, covering topics related to the certification of
ML systems. In total, we identified 217 papers covering topics considered to be
the main pillars of ML certification: Robustness, Uncertainty, Explainability,
Verification, Safe Reinforcement Learning, and Direct Certification. We
analyzed the main trends and problems of each sub-field and provided summaries
of the papers extracted.
Results: The SLR results highlighted the enthusiasm of the community for this
subject, as well as the lack of diversity in terms of datasets and type of
models. It also emphasized the need to further develop connections between
academia and industries to deepen the domain study. Finally, it also
illustrated the necessity to build connections between the above mention main
pillars that are for now mainly studied separately.
Conclusion: We highlighted current efforts deployed to enable the
certification of ML based software systems, and discuss some future research
directions.Comment: 60 pages (92 pages with references and complements), submitted to a
journal (Automated Software Engineering). Changes: Emphasizing difference
traditional software engineering / ML approach. Adding Related Works, Threats
to Validity and Complementary Materials. Adding a table listing papers
reference for each section/subsection
- …